27 research outputs found

    : Méthodes d'Inférence Symbolique pour les Bases de Données

    Get PDF
    This dissertation is a summary of a line of research, that I wasactively involved in, on learning in databases from examples. Thisresearch focused on traditional as well as novel database models andlanguages for querying, transforming, and describing the schema of adatabase. In case of schemas our contributions involve proposing anoriginal languages for the emerging data models of Unordered XML andRDF. We have studied learning from examples of schemas for UnorderedXML, schemas for RDF, twig queries for XML, join queries forrelational databases, and XML transformations defined with a novelmodel of tree-to-word transducers.Investigating learnability of the proposed languages required us toexamine closely a number of their fundamental properties, often ofindependent interest, including normal forms, minimization,containment and equivalence, consistency of a set of examples, andfinite characterizability. Good understanding of these propertiesallowed us to devise learning algorithms that explore a possibly largesearch space with the help of a diligently designed set ofgeneralization operations in search of an appropriate solution.Learning (or inference) is a problem that has two parameters: theprecise class of languages we wish to infer and the type of input thatthe user can provide. We focused on the setting where the user inputconsists of positive examples i.e., elements that belong to the goallanguage, and negative examples i.e., elements that do not belong tothe goal language. In general using both negative and positiveexamples allows to learn richer classes of goal languages than usingpositive examples alone. However, using negative examples is oftendifficult because together with positive examples they may cause thesearch space to take a very complex shape and its exploration may turnout to be computationally challenging.Ce mémoire est une courte présentation d’une direction de recherche, à laquelle j’ai activementparticipé, sur l’apprentissage pour les bases de données à partir d’exemples. Cette recherches’est concentrée sur les modèles et les langages, aussi bien traditionnels qu’émergents, pourl’interrogation, la transformation et la description du schéma d’une base de données. Concernantles schémas, nos contributions consistent en plusieurs langages de schémas pour les nouveaumodèles de bases de données que sont XML non-ordonné et RDF. Nous avons ainsi étudiél’apprentissage à partir d’exemples des schémas pour XML non-ordonné, des schémas pour RDF,des requêtes twig pour XML, les requêtes de jointure pour bases de données relationnelles et lestransformations XML définies par un nouveau modèle de transducteurs arbre-à-mot.Pour explorer si les langages proposés peuvent être appris, nous avons été obligés d’examinerde près un certain nombre de leurs propriétés fondamentales, souvent souvent intéressantespar elles-mêmes, y compris les formes normales, la minimisation, l’inclusion et l’équivalence, lacohérence d’un ensemble d’exemples et la caractérisation finie. Une bonne compréhension de cespropriétés nous a permis de concevoir des algorithmes d’apprentissage qui explorent un espace derecherche potentiellement très vaste grâce à un ensemble d’opérations de généralisation adapté àla recherche d’une solution appropriée.L’apprentissage (ou l’inférence) est un problème à deux paramètres : la classe précise delangage que nous souhaitons inférer et le type d’informations que l’utilisateur peut fournir. Nousnous sommes placés dans le cas où l’utilisateur fournit des exemples positifs, c’est-à-dire deséléments qui appartiennent au langage cible, ainsi que des exemples négatifs, c’est-à-dire qui n’enfont pas partie. En général l’utilisation à la fois d’exemples positifs et négatifs permet d’apprendredes classes de langages plus riches que l’utilisation uniquement d’exemples positifs. Toutefois,l’utilisation des exemples négatifs est souvent difficile parce que les exemples positifs et négatifspeuvent rendre la forme de l’espace de recherche très complexe, et par conséquent, son explorationinfaisable

    Priority-Based Conflict Resolution in Inconsistent Relational Databases

    Full text link
    We study here the impact of priorities on conflict resolution in inconsistent relational databases. We extend the framework of repairs and consistent query answers. We propose a set of postulates that an extended framework should satisfy and consider two instantiations of the framework: (locally preferred) l-repairs and (globally preferred) g-repairs. We study the relationships between them and the impact each notion of repair has on the computational complexity of repair checking and consistent query answers

    Learning Twig and Path Queries

    Get PDF
    International audienceWe investigate the problem of learning XML queries, \emph{path} queries and \emph{twig} queries, from examples given by the user. A learning algorithm takes on the input a set of XML documents with nodes annotated by the user and returns a query that selects the nodes in a manner consistent with the annotation. We study two learning settings that differ with the types of annotations. In the first setting the user may only indicate \emph{required nodes} that the query must return. In the second, more general, setting, the user may also indicate \emph{forbidden nodes} that the query must not return. The query may or may not return any node with no annotation. We formalize what it means for a class of queries to be \emph{learnable}. One requirement is the existence of a learning algorithm that is \emph{sound} i.e., always returns a query consistent with the examples given by the user. Furthermore, the learning algorithm should be \emph{complete} i.e., able to produce every query with sufficiently rich examples. Other requirements involve tractability of the learning algorithm and its robustness to nonessential examples. We identify practical classes of Boolean and unary, path and twig queries that are learnable from positive examples. We also show that adding negative examples to the picture renders learning unfeasible

    Apprentissage relationnel polynomial pour la classification d'arbres

    Get PDF
    National audienceAprès avoir rappelé le cadre général de la programmation logique inductive, nous proposons une sous-famille des clauses de Horn nommée MQD. Visant des applications de classification de document XML, nous définissons un langage de clauses permettant de représenter des arbres et des motifs d'arbres. Ce langage nous fournit exemples et hypothèses. Nous montrons que ce langage est inclus dans les MQD et proposons des algorithmes dédiés pour les opérations de base nécessaires à l'apprentissage, à savoir les calculs de theta-subsomption et de moindre généralisé. Nos algorithmes étant polynomiaux et non exponentiels comme dans le cas général des clauses de Horn, ils peuvent participer à la classification supervisée d'arbres, l'apprentissage relationnel devenant alors de complexité polynomiale

    Containment of Shape Expression Schemas for RDF

    Get PDF
    International audienceWe study the problem of containment of shape expression schemas (ShEx) for RDF graphs. We identify a subclass of ShEx that has a natural graphical representation in the form of shape graphs and whose semantics is captured with a tractable notion of embedding of an RDF graph in a shape graph. When applied to pairs of shape graphs, an embedding is a sufficient condition for containment, and for a practical subclass of deterministic shape graphs, it is also a necessary one, thus yielding a subclass with tractable containment. Containment for general shape graphs is EXP-complete. Finally , we show that containment for arbitrary ShEx is decid-able. CCS CONCEPTS • Information systems → Graph-based database models ; Resource Description Framework (RDF); • Theory of computation → Database theory; Database interoper-ability

    A Supervised Approach for Rhythm Transcription Based on Tree Series Enumeration

    Get PDF
    International audienceWe present a rhythm transcription system integrated in the computer-assisted composition environment OpenMusic. Rhythm transcription consists in translating a series of dated events into traditional music notation's pulsed and structured representation. As transcription is equivocal, our system favors interactions with the user to reach a satisfactory compromise between various criteria, in particular the precision of the transcription and the readability of the output score. It is based on a uniform approach, using a hierarchical representation of duration notation in the form of rhythm trees, and an efficient dynamic-programming algorithm that lazily evaluates the transcription solutions. It is run through a dedicated user interface allowing to interactively explore the solution set, visualize the solutions and locally edit them

    Static Analysis of Graph Database Transformations

    Full text link
    We investigate graph transformations, defined using Datalog-like rules based on acyclic conjunctive two-way regular path queries (acyclic C2RPQs), and we study two fundamental static analysis problems: type checking and equivalence of transformations in the presence of graph schemas. Additionally, we investigate the problem of target schema elicitation, which aims to construct a schema that closely captures all outputs of a transformation over graphs conforming to the input schema. We show all these problems are in EXPTIME by reducing them to C2RPQ containment modulo schema; we also provide matching lower bounds. We use cycle reversing to reduce query containment to the problem of unrestricted (finite or infinite) satisfiability of C2RPQs modulo a theory expressed in a description logic

    Static analysis of XML security views and query rewriting

    Get PDF
    International audienceIn this paper, we revisit the view based security framework for XML without imposing any of the previously considered restrictions on the class of queries, the class of DTDs, and the type of annotations used to define the view. First, we study {\em query rewriting} with views when the classes used to define queries and views are Regular XPath and MSO. Next, we investigate problems of {\em static analysis} of security access specifications (SAS): we introduce the novel class of \emph{interval-bounded} SAS and we define three different manners to compare views (i.e. queries), with a security point of view. We provide a systematic study of the complexity for deciding these three comparisons, when the depth of the XML documents is bounded, when the document may have an arbitrary depth but the queries defining the views are restricted to guarantee the interval-bounded property, and in the general setting without restriction on queries and document

    XML Security Views Revisited

    Get PDF
    International audienceIn this paper, we revisit the view based security framework for XML without imposing any of the previously considered restrictions on the class of queries, the class of DTDs, and the type of annotations used to dene the view. First, we show that the full class of Regular XPath queries is closed under query rewriting. Next, we address the problem of constructing a DTD that describes the view schema, which in general needs not be regular. We propose three dierent methods of ap- proximating the view schema and we show that the produced DTDs are indistinguishable from the exact schema (with queries from a class speci c for each method). Finally, we investigate problems of static analysis of security access specications
    corecore